Systematic biases in parameter estimation of binary black-hole mergers 
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Parameter estimation of binary-black-hole merger events in gravitational-wave data relies on 
matched filtering techniques, which, in turn, depend on accurate model waveforms. Here we charac- 
terize the systematic biases introduced in measuring astrophysical parameters of binary black holes 
by applying the currently most accurate effective-one-body templates to simulated data containing 
non-spinning numerical-relativity waveforms. For advanced ground-based detectors, we find that the 
systematic biases are well within the statistical error for realistic signal-to- noise ratios (SNR). These 
biases grow to be comparable to the statistical errors at high signal-to-noise ratios for ground-based 
instruments (SNR ~ 50) but never dominate the error budget. At the much larger signal-to-noise 
ratios expected for space-based detectors, these biases will become large compared to the statistical 
errors but are small enough (at most a few percent in the black- hole masses) that we expect they 
should not affect broad astrophysical conclusions that may be drawn from the data. 

PACS numbers: 


I. INTRODUCTION 

Binary black hole (BBH) coalescences are corner- 
stone sources for gravitational-wave (GW) detectors, be 
they existing ground-based detectors like LIGO [1] and 
Virgo [2], planned space-based detectors such as classic 
LISA [3] or eLISA [4], or a pulsar timing array [5]. The 
analysis of GW data to detect and characterize binary- 
black- hole merger events and to test the predictions of 
general relativity requires some family of efficiently com- 
putable signal models, representing all the possible wave- 
forms consistent with general relativity. 

Modeling BBH waveforms has historically been sepa- 
rated into three regimes, divided by the different com- 
putational procedures suitable for each. The ‘ inspiral” 
where the individual black holes are sufficiently separated 
for the post-Newtonian (PN) expansion to be valid [6- 
9], the ‘ merger” of the binary where numerical relativity 
(NR) is needed [10-12], and the “ringdown” phase of a 
single, post-merger, perturbed object relaxing to a Kerr 
black hole [13, 14]. 

During the last years, the work at the interface be- 
tween analytical and numerical relativity has provided 
the community with a variety of semi-analytical inspiral- 
merger-ringdown waveform sets [15-26] of varying scope 
in parameter range and accuracy. These waveforms have 
already been used to search for GWs from high-mass [27] 
and intermediate-mass [28] binary black holes in LIGO 
and Virgo data and also to carry out preliminary param- 
eter estimation studies for ground-based detectors [29] 
and space-based detectors, such as classic LISA [30, 31], 
In this paper we shall studv a set of inspiral-merger- 


ringdown waveforms based on the Effective-One-Body 
(EOB) framework [32-36]. 

Template waveforms will always be an approximation 
to the true signals and the difference, if large enough, 
can bias inferences made from the GW data about the 
astrophysical parameters of the system or the validity 
of general relativity. Estimates of the systematic errors 
introduced by waveform approximants in the literature 
[25, 37-41] have focused only on the inspiral or used gen- 
eral conservative criteria to determine when the wave- 
form has a bias, never using the parameter-estimation 
techniques employed in actual data analysis. 

In this work, we will carry out the first measurement of 
systematic biases introduced when determining the phys- 
ical parameters of a BBH merger by using EOB wave- 
forms as templates. We do so by simulating data that 
contain NR waveforms as the “signal” to be detected. 
Then, using the Markov Chain Monte Carlo (MCMC) 
method [42, 43], we sample the posterior distribution 
function for the binary parameters using EOB waveforms 
as the templates. The characteristic width of the poste- 
rior as determined by the MCMC is taken as the statisti- 
cal error, while the distance in parameter space between 
the dominant mode of the posterior and the true, or “in- 
jected” waveform parameters is the systematic error, or 
bias, introduced by these waveforms. We study several 
different BBH systems, sampling the total mass, mass 
ratio, and signal-to-noise ratio (SNR) space for both a 
LIGO/Virgo network in the advanced detector era and a 
LISA-like configuration. 

For this first analysis we employ non-spinning wave- 
forms for quasi-circular orbits. In particular, for the in- 
jected signals, we consider the NR waveforms produced 
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by the Caltech-Cornell-CITA collaboration in Ref. [44], 
For the templates we use the EOB waveforms that were 
calibrated in Ref. [25] to those NR waveforms [44]. Be- 
cause our emphasis is on BBHs, and the merger wave- 
forms in particular, it is certainly the case that spin 
magnitude and orientation play an important role in the 
waveform and parameter estimation [45-49]. NR wave- 
forms with spins aligned or anti-aligned with the orbital 
angular momentum are available, and EOB waveforms 
that include spins have been developed in Refs. [22, 26, 
36]. However, the spinning EOB waveforms are currently 
restricted to the dominant mode and additional code 
development is needed before they can be employed in 
stochastic sampling methods like the MCMC. Thus, we 
leave to the future the extension of this study to spinning 
BBHs. 

Within these limitations, we show that the EOB wave- 
forms developed in Ref. [25] and tested here are accu- 
rate enough to introduce little-to-no significant biases 
when the data contain NR waveforms at SNRs consis- 
tent with expectations for likely LIGO/Virgo detections 
(SNR < 50). For LIS A- like detections, where the ex- 
pected SNRs are much higher than for ground-based de- 
tectors, significant biases do emerge. Nonetheless, we 
find that the discrepancies between the true and mea- 
sured parameters, at a few percent for the black-hole 
masses, are small enough to not impact key astrophysi- 
cal conclusions that may be drawn from the data (e.g., 
black-hole seed models, etc.) [4, 50, 51]. However, when 
very high accuracies are required, as when testing the 
validity of general relativity [52, 53], best-fit EOB wave- 
forms from the existing model will leave behind signif- 
icant residual power, making them ill suited for these 
applications without further development. 

The remainder of the paper is organized as follows. 
In Sec. II we describe the numerical and analytic wave- 
forms used in this work. In Sec. ni we lay out how the 
study will proceed, describing in particular the MCMC 
sampler that we use. We then discuss in detail the re- 
sults for stellar-mass BBHs in ground-based detectors 
(Sec. IV) and super-massive BBHs in space-based obser- 
vatories (Sec. V). In Sec. VI we summarize the findings 
from this work, address limitations, and discuss future 
directions to be pursued. 


II. INSPIRAL-MERGER-RINGDOWN 
WAVEFORMS USED IN THE ANALYSIS 

Our study involves comparisons between two sets of 
waveforms. We primarily seek to evaluate a continu- 
ously parametrizable family of model waveforms based 
on the EOB framework, against a discrete set of highly 
accurate numerical relativity (NR) waveforms. In anal- 
ogy with observational algorithm tests, we can think of 
the numerical waveforms as ‘ injected” signals, which we 
challenge the “template” EOB waveforms to match. 

We employ as injected signals the non-spinning NR 


waveforms produced by the Caltech-Cornell-CITA col- 
laboration [44], using the spectral Einstein code. The 
NR polarizations have mass ratio q = m\/m 2 = 
1,2, 3,4, 6 and contain -2 spin- weighted spherical har- 
monics (*,m) = (2, ±2), (2,±1), (2,0), (3, ±3), (3, ±2), 
(4, ±4), (5, ±5), and (6, ±6). These waveforms provide 
30-40 GW cycles before merger, depending on the mass 
ratio. 

The phase and amplitude errors of the NR waveforms 
vary with mass ratio and gravitational mode. The nu- 
merical errors grow toward merger and ringdown, and 
typically at merger, for the dominant (2,2) mode, the 
phase error ranges between 0.05 and 0.25 rad, while the 
fractional amplitude error is at most 1%. The subdomi- 
nant modes can have somewhat larger errors, especially 
the (3, 3) and (4, 4) modes. 

Applying the numerical waveforms to generate mock 
signal observations, we will test the ability of a previously 
published family of template waveforms to characterize 
these signals [25]. These template waveforms are based 
on the EOB framework, founded on the very accurate 
results of PN theory, an expansion of general-relativity 
dynamics in polynomials in v/c. In the EOB approach, 
however, the PN expansions are applied in a resummed 
form that maps the dynamics of two compact objects into 
the dynamics of a reduced-mass test particle moving in 
a deformed Kerr geometry [32-36], Waveforms in the 
EOB formalism are derived from such particle dynamics 
up to the light-ring (unstable photon orbit) radius. The 
subsequent ringdown portion of the waveforms is a su- 
perposition of quasinormal modes matched continuously 
to the inspiral. Tunable parameters effectively standing 
in for currently unknown higher-order PN terms are fixed 
by matching to numerical relativity simulation results. 

The comparable-mass NR waveforms in Ref. [44] were 
used, together with the small-mass-ratio waveforms pro- 
duced by the Teukolsky code in Ref. [54], to cali- 
brate a non-spinning EOB model in Ref. [25]. More 
specifically, the numerical waveforms available at dis- 
crete points in the parameter space were employed 
to fix a handful of EOB adjustable parameters enter- 
ing the EOB conservative dynamics and gravitational 
modes. These adjustable parameters were then interpo- 
lated over the entire mass-ratio space. The EOB model in 
Ref. [25] contains four subdominant gravitational modes, 
(2, ±1), (3, ±3), (4, ±4) and (5, ±5), beyond the dominant 
mode (2, ±2). 

The EOB model in Ref. [25] has been coded in the 
(public) LIGO Algorithm Library (LAL) [55] (under the 
name EOBNRv2). We carry out our study using LAL 
to generate template waveforms. Henceforth, we de- 
note the EOB model with only the dominant (2, 2) mode 
as EOB 22 , and the model that includes the four sub- 
dominant modes (2, ±1), (3, ±3), (4, ±4) and (5, ±5) as 
EOBhh- 

The phase difference of the (2,2) mode between the 
calibrated EOB model and numerical simulation remains 
below ~ 0.1 rad throughout the evolution for all mass 
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ratios considered; the fractional amplitude difference at 
merger of the (2, 2) mode is 2% and grows to 12% during 
the ringdown. Around merger and ringdown, the phase 
and amplitude differences of the subdominant modes be- 
tween the EOB and NR waveforms are somewhat larger 
than the ones of the (2, 2) mode. [The numerical er- 
rors, and phase and amplitude differences between the 
EOB and NR waveforms can be read from Figs. 6-10 in 
Ref. [25].] 

To quantify how these differences between template 
and signal would affect GW searches in Advanced LIGO, 
Ref. [25] studied the effectualness and measurement ac- 
curacy of the EOB model. When investigating the effec- 
tualness for detection purposes, they found that the NR 
polarizations containing the strongest seven modes have 
a marimum mismatch of 7% for stellar-mass BBHs, and 
10% for intermediate-mass BBHs, when only the EOB 
(2, 2) mode is included for q = 1, 2, 3, 4, 6 and binary to- 
tal masses 20-200 Hz. However, the mismatches decrease 
when all the four subdominant, EOB modes are taken into 
account reaching an upper bound of 0.5% for stellar-mass 
BBHs, and 0.8% for intermediate-mass BBHs. Thus, the 
EOB model developed in Ref. [25] is accurate enough for 
detection, which generally requires a mismatch not larger 
than 7%. 

To understand whether the EOB model developed in 
Ref. [25] is precise enough for measurement purposes, the 
authors carried out a preliminary study, adopting as ac- 
curacy requirement for measurement the one proposed 
in Refs. [40, 41]. Using a single Advanced LIGO detec- 
tor, Ref. [25] computed the SNRs below which the EOB 
polarizations are accurate enough that systematic biases 
are smaller than statistical errors. Since subdominant 
modes have non-negligible contribution for large mass 
ratios and those modes have the largest amplitude er- 
rors, they found that the upper-bound SNRs are lower 
for the most asymmetric systems, such as q ~ 6. How- 
ever, as stressed in Ref. [25], the accuracy requirement in 
Ref. [40, 41] may be too conservative and by itself it does 
not say which of the binary parameters will be biased 
and hew large the bias will be. It could turn out that the 
biased parameters have little relevance in astrophysics or 
tests of general relativity. It is the main goal of this pa- 
per to measure the actual biases of the EOB model with 
and without the subdominant modes. 

Our study is restricted to binary systems moving along 
quasi-circular orbits where the spin of each constituent 
black hole is negligible, thus reducing the model param- 
eters (f from a space of 17 dimensions to 9 dimensions: 


9 = {lnM,lnA4,lnZ?L,t p ,sinJ,a,cosi,^, ip p } . (1) 

In the above equation, M = (1 + z)(toi + m 2 ) is the red- 
shifted total mass of the binary, and M = v 3 / 5 M is its 
chirp mass, where v = miwi 2 /(mi + m 2) 2 — q/{ 1 + q) 2 
is the symmetric mass-ratio. We denote by Dj, the lumi- 
nosity distance, which, along with the right ascension 
a and declination 5, describes the location of the bi- 


nary. The orientation of the binary’s orbital angular- 
momentum vector L with respect to the line of sight k 
from the observer is encoded in the model using the in- 
clination 1 , polarization angle ip, and phase ip p — the 
Euler angles that describe the rotation from k to L. The 
parameter t p is the time. of the (2,2) mode’s maximum 
amplitude, when the phase is <p p . 

Our comparisons between the EOB waveforms and 
the NR data are restricted to the late-inspiral, merger 
and ringdown, that is roughly 30-40 GW cycles before 
merger, depending on the mass ratio. The injected sig- 
nals contain only the NR waveforms available. We do not 
match the NR waveforms to EOB or PN waveforms at 
low frequency to increase the number of cvcles because 
we do not know how well the EOB or PN waveforms 
would approximate the NR waveforms outside the region 
of calibration and we do not want to introduce unknown 
errors when estimating the systematic biases. 

There is no guarantee that a template that is in phase 
with the NR waveform during the last 30-40 GW cycles 
will remain so throughout the entire inspiral. The mass 
parameters are strongly encoded in the GW phase, so 
any additional de-phasing at earlier times than covered 
by the NR simulations can potentially increase the sys- 
tematic biases. Therefore, in order for our study to be 
meaningful, we consider binary systems that have a total 
mass such that the majority of the SNR is accumulated 
during the last 30-40 GW cycles before merger. 


III. STATISTICAL VERSUS SYSTEMATIC 
ERRORS USING MCMC TECHNIQUES 

We use the MCMC algorithm [42, 43] to produce sam- 
ples from the posterior distribution function for the wave- 
form parameters. The MCMC sampler is built on the 
foundation of Bayes’ Theorem, which, in the context of 
parameter inference, defines the posterior distribution 
function for parameter vector 6 and data d as 

Here p(-j-) are conditional probability densities with ar- 
guments on the right-hand side of the bar assumed to 
be true, p(d\9,T) is the likelihood, p(6 \T) is the prior 
distribution, and p(d|Z) is the model evidence, which, in 
parameter-estimation applications, serves only as a nor- 
malization constant. The information I denotes all of 
the assumptions that are built into the analysis, par- 
ticularly that the NR waveforms represent reality (see 
Sec. H for associated discussion). Henceforth, to sim- 
plify notation, we shall not include I when writing the 
conditioned probability density p(-|-). When comparing 
different model combinations, we adopt the notation for 
conditional probabilities with arguments to the right of 
a vertical bar representing the data and arguments to 
the left signifying which model was used as templates. 
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For example, results labeled p(EOB 22 | NR) come from the 
posterior distribution functions for models using EOB 22 
as the templates and an NR waveform as the data. 

With the posteriors, we compare the statistical error 
(the characteristic width of the posterior) to the system- 
atic error (the displacement from the injected parameter 
values of the posterior’s mode). We do not include a noise 
realization in the simulated data, as that introduces addi- 
tional biases - each noise realization pushing the best-fit 
solution away from the injected value in a different way 
— which are not easily quantified [56]. As the control 
in this experiment, we simulated (noise-free) data with 
EOB waveforms and use EOB templates for parameter 
estimation, giving us one set of results with no systematic 
bias except from the sampling error in the Markov chains 
due to their finite length. We use these controlled results 
as cole verification, and as the standard against which 
the other models’ performance is compared. We then 
test the EOB waveform models by injecting NR wave- 
forms (summed over all available modes) and using the 
two EOB models discussed in Sec. II as templates. The 
EOB 22 model is used as a baseline, as it has been em- 
ployed in LIGO/Virgo search pipelines to analyze data 
collected in recent science runs [28, 57], The EOBhh 
model is the most complete waveform at our disposal 
and is used to measure how well the EOB model could 
perform on NR data. 

In Eq. (2) we use the standard gaussian logarithmic 
likelihood lnp(d|0) = — (d — h( 6 )\d— h(d ))/2 + C, where 
C is a normalization constant that does not depend on 
model parameters and is henceforth neglected, h is the 
template, and 


i = \Jfui\a ‘-’n \J ) 


( 3 ) 


denotes a noise- weighted inner product with the sum on i 
over / (independent) interferometer channels and S\{f) 
is the one-sided noise power spectral density (PSD) for 
detector i. The bounds of integration [/mim/Nyq] are 
the minimum frequency of the NR waveform and the 
Nyquist frequency of the data, respectively. The Nyquist 
frequency is chosen to ensure that the highest frequency 
portion of the waveform is well below the instrument sen- 
sitivity curve, while f mm is set by the duration of the NR 
waveform and the total mass of the system, such that we 
only integrate over frequencies where there is numerical 
data. 

For each case, the Markov chains are run for 10 6 it- 
erations, taking about 10 3 CPU hours to complete. The 
chains rely on parallel tempering [58], differential evolu- 
tion [59], and jumps along eigenvectors of the FMI (e.g., 
see Ref. [60]) computed from PN waveforms to efficiently 
explore the posterior distribution function. We use burn- 
in times of 10 4 samples, and run several chains with dif- 
ferent initial locations to check for convergence. Prior 
distributions for all parameters are chosen to be uniform. 
Azimuthal angular parameters (a, V’> an d <£ P ) have sup- 
port over [0, 2 ir) with periodic boundary conditions, while 


declination-like angle parameters (sin <5, cost) range from 
[— 1, 1] with reflecting boundary conditions. The ranges 
for InM and InDL are chosen to be large enough so as 
to not influence the posteriors. The prior range on In M 
is coupled to InM, as the maximum value of the chirp 
mass occurs for the q = 1 [v =1/4) case, and depends 
on the total mass M of the system. Because of this, the 
prior boundary on chirp mass does affect the posteriors 
for the equal-mass systems considered in this work. 

The products of our analysis procedure are samples 
from the posterior distribution function p{ 6 \d) — an 
oddly shaped, sometimes multimodal, blob living in a 9D 
space. There is no perfect way of distilling this informa- 
tion into a simple, robust, statistic to assess parameter- 
estimation accuracy. We will make do with the “frac- 
tional systematic error” S0g. For parameter 9, we first 
define the systematic error 8q = |#map — 0o| where <?map 
is the maximum a posteriori (MAP) value and 0 O is the 
injected value, while the statistical error is quantified by 
the standard deviation of the ID marginalized posterior 
distribution function a#. We then define the fractional 
systematic error as the ratio between fie and the statis- 
tical error: 

«&=-. ( 4 ) 

<79 

We consider templates that consistently yield 6 (80 < 1 as 
introducing negligible bias assuming the NR waveforms 
are exact, winch, as seen in Sec. n, is not the case 1 . 

The fractional systematic error (4) can be interpreted 
as the number of standard deviations away from the in- 
jected value at which we find the MAP waveform. This 
choice of statistic is not perfect — low-SNR systems have 
very non-gaussian posteriors making the standard devia- 
tion a poor choice for characterizing the statistical error. 
Furthermore, the MAP parameters are a single point and 
tell us nothing about how large a region in parameter 
space had similar posterior support to the current best 
estimate. Additionally, the MAP value is a feature of 
the full 9D posterior, while the variances are computed 
from the marginalized posterior distribution functions. 
This introduces complications for some special cases as 
we shall discuss in detail below. 


IV. RESULTS FOR ADVANCED LIGO 
DETECTORS 

The first test of the EOB waveforms uses simulated 
data from the network of advanced ground-based detec- 
tors expected to come on line in the middle of this decade: 
the two LIGO detectors in the USA and the Virgo de- 
tector in Italy. We use the same noise PSD for each in- 
terferometer, the “zero-detuned high-power” curve from 


1 Currently we have no way of incorporating the numerical error 
of the NR waveforms into our estimate and so we neglect it. 
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Ref. [61], which is the sensitivity curve for the fully com- 
pleted Advanced LIGO detector. The GW response in 
each interferometer is modeled by convolving the GW 
signal with the beam-pattern function for that detector 
and applying the appropriate time-delays between inter- 
ferometers [62]. 


A. Choice of binary configurations 

We study several binary configurations using different 
mass ratios, total masses, and S NRs. The SNR of the 
system is computed via SNR = y/(d\d), and its value is 
controlled bv adjusting the luminosity distance Lfr. Be- 
cause our simulated data d contain no simulated noise, 
the SNR is simply the inner product of the injected wave- 
form with itself. Also, our definition of the inner prod- 
uct in Eq. (3) includes a summation over all interferom- 
eter channels, thus we quote the network SNRs for the 
ground-based studies. 

The first three panels of Fig. 1 show the time-domain 
EOBhh waveforms (blue, dotted) and whitened by the 
noise spectral density (red, solid) for three representative 
cases studied here. The vertical lines indicate intervals in 
which 10% of the signal power is accrued, starting from 
/min for each system, with the rightmost line indicating 
where 99.9% of the power has accumulated. The power 
intervals are included as a guide to see which portions of 
the waveform contribute most to the parameter estima- 
tion. For instance, the q = 1, M = 50 M 0 waveform 
acquires 50% of the signal power in the last 1000 M of 
the signal, while the q — 2, M = 23 M© signal is much 
less dominated by the late-inspiral and merger, account- 
ing for only 30% of the power. The 9 = 6, M = 120 Mq 
examples are most influenced by the latter stages of the 
signal, which make up over 70% of the power. 

The bottom-right panel of Fig. 1 shows the strain spec- 
tral densities for the same three systems, now using the 
NR waveforms used in this study. Also included is the 
Advanced LIGO power spectral density. 

We focus on moderately high-mass black-hole merg- 
ers with M ~ 50 Mq (e.g., the equal- mass case shown 
in Fig. 1: top-left panel, red solid curve in bottom- 
right panel). Beyond their potential as Advanced LIGO 
sources 2 , high- mass systems serve an important role in 


2 A binary with M ~ 5OM 0 is astrophysically relevant, as 

the largest black-hole mass ever observed is in the range of 
23-34 Mq [63, 64]. Recent results from population-synthesis 
studies suggest that massive, low-metalliclty stars are capable 
of producing black holes as large as M ~ 80 Mq [65], although 
these findings are for single stars only, and binary evolution could 
either increase or decrease the maximum black-hole mass. Addi- 
tionally, there exists at least one example of a massive Wolf-Rayet 
star, R136al [66], with M «v 250 Mq at a distance of ~ 0.1 Mpc 
and with sufficiently low metallicity to produce a massive black 
hole. However, it cannot be excluded that the star goes instead 
through a pair-instability supemorae leaving no remnant. 


testing the waveform models for two reasons: First, as 
explained in Sec. II, the NR signals are short in duration 
and we do not supplement the waveform by hybridiz- 
ing the numerical data with analytic inspiral models at 
low frequency. We therefore require higher-mass sys- 
tems, merging at lower frequency, to ensure that most 
of the inspiral missing from the NR data will fall outside 
the sensitive measurement band of the detector. With 
M ~ 50 Mq NR waveforms start at ~ 30 Hz, setting 
/min in the inner product defined in Eq. (3). Comparing 
these data to EOB waveforms with the same parameters 
but / m in = 10 Hz (below which the Advanced LIGO sen- 
sitivity is very poor), we find the NR waveforms contain 
~ 85% of the total signal power, or ~ 90% of the total 
SNR. 

A second reason for focusing on M ~ 50 Mq for the 
binary is that these systems are “centrally located” in 
frequency over the most sensitive band of the advanced 
detectors ( (V 30 to 10 3 Hz, e.g., see the bottom-right 

panel in Fig. L), such that inspiral, merger, ringdown, 
and additional modes all contribute to the overall signal 
power and, accordingly, the parameter-estimation capa- 
bilities. Generally speaking, we expect such systems to 
make the greatest demands on complete inspiral-merger- 
ringdown waveform model accuracy. 

While the M ~ 50 Mq systems serve as the basis 
for our comparisons, we include additional examples to 
probe regions of signal space that are of particular in- 
terest. These include a q = 6, 12OM 0 system (Fig. 1: 
bottom-left panel, blue dotted curve in bottom-right 
panel) chosen such that the subdominant modes con- 
tribute the most, as they will be most pronounced at high 
mass ratios, and signal power from the higher frequency 
modes is still in the sensitive band of the detector. At 
this mass, / m j n = 10 Hz, so our analysis is not missing 
any signal power due to the length of the NR data. 

We also go to lower masses, using a q = 2, 23 M© 
binary (Fig. 1: top-right panel, green dashed curve in 
bottom-right panel) as a more likely LIGO /Virgo de- 
tection, to demonstrate the EOB models’ parameter- 
estimation accuracy not at the extremes of a potential 
binary signal, but within reasonable expectations of what 
the coming data may hold (apart from including the 
black- hole spins). It is worth noting that for these low- 
mass systems we are missing a large portion of the in- 
spiral, as / m i n = 60 Hz and ~ 30%d of the full SNR will 
be accumulated below that frequency. Therefore, these 
results might change in the future when longer EOB and 
NR waveforms become available. 


B. Results on systematic biases at fixed inclination 
angle 

In Fig. 2 we plot the ID marginalized posterior dis- 
tribution functions for each parameter for the case of a 
binary with mass ratio 9 = 2, total mass M = 51 M©, 
and network SNR = 48 produced using an MCMC sam- 
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q = 1 , M = 50 



q = 2, M = 23 



q = 6, M= 120 




PIG. 1: [Top row, and bottom-left] Time-domain EOB waveforms (blue dotted) and the same signal whitened by the noise 
spectral density (red solid) representing different test cases studied in this work. The time-axis is scaled by the total mass 
and shifted by the merger time. The horizontal dotted lines demarcate 10% intervals for accumulated signal power, with the 
rightmost line at the 99.9% mark. [Bottom right] Strain spectral densities \h(f)\ showing the estimate for the Advanced LIGO 
noise curve (gray dotted line) and the NR waveforms for the same representative examples as the other panels. 


pier. The inclination angle is chosen to be t ~ tt/3. 
The independent variables in these plots are AO = 
0 — 0j n j, where 0j n j are the injected parameter values. 
The pt'EOB 22 |EOB 22 ) histograms (green, dashed lines) 
are the posteriors using the EOB 22 waveforms for both 
the signal and the templates. These confirm that the 
MCMC sampler is working properly, as the posteriors all 
show strong support for the injected waveform param- 
eters (peaking at or near 0) and statistical errors con- 
sistent with results in Ref. [29] obtained using FMI es- 
timates and phenomenological inspiral-merger-ringdown 
waveforms. 

The red (solid) lines and blue (dotted) lines are for 
data containing an NR signal and the EOB 22 and EOBhh 
waveforms as templates, respectively. The bottom-right 
panel shows the logarithmic likelihood distributions for 
each chain. We can see from these posteriors that the 
EOB 22 waveform is significantly biased away from the 
NR injected value, by ~ 1% in both M and ,\4. This 


bias is substantially reduced when including the subdom- 
inant modes in the EOB template, to the point where 
the systematic error is well within the statistical error 
of the posterior. For the extrinsic parameters such as 
distance and sky-location, the posteriors for the approxi- 
mate templates are nearly identical to those produced by 
using the exact same waveform for both data simulation 
and parameter estimation. We also see the role that sub- 
dominant modes play in breaking the 7r/2 degeneracy in 
the polarization angle ip, which can aid in distance and 
sky-location determination for some systems. 

The SNR o f the residual d — h, given by SNR r „ = 
y/— 21np(d]0) can be inferred from the bottom-right 
panel. Viewed in this way, there is a distinct excess in 
the residual for even the EOBhh case (blue, dotted) in 
comparison with the idealized control MCMC residuals 
(green, dashed). To understand the significance of this, 
consider applying a detection threshold of SNR = 6 [67] 
for the residual waveform. This corresponds to a maxi- 
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FIG. 2 Example marginalized posterior distribution functions for a binary system with mass ratio q — 2, total mass M = 5 1 M© , 
and network SNR = 48 produced using an MCMC sampler. The green (dashed) histograms arise from injecting the EOB 22 
waveforms and using the same model as the template. The red (solid) distributions use the EOB 22 waveform while the blue 
(dotted) curves use the EOBhh waveform to recover an NR injection. The s-axis shows the distance away from the injected 
parameter value, so distributions that peak at zero show no bias. The bottom-right panel shows the logarithmic likelihood 
distributions, which are used to quantify the amount of residual power left behind by the waveform model. The y-axes have 
be normalized to range between [0, 1]. 


mum logarithmic likelihood of < —18, below which the 
residual could potentially contain enough power to be de- 
tected after the best-fit waveform is regressed from the 
data. Suppose in the near future we have model wave- 
forms at our disposal containing all of the details of black- 
hole mergers (i.e. spins and eccentricity) that generally 
produce SNR res < 6 for equally detailed NR simulations, 
and yet coherent residuals are consistently found in the 
data. Such an event could suggest a possible departure 
from general relativity. 

The results from our MCMC studies for different sys- 
tems are displayed in Tables I and II, which show the 
fractional systematic error 50 (defined in Sec. Ill) for 
the mass, sky location, and distance parameters. The 
injected waveforms again had an inclination angle of 
l - 7t/3. We include our estimate of the statistical error 
ere to quantify the precision of the advanced detectors for 
the systems considered here. The standard deviations 


should be interpreted with caution; we do not include 
noise in the simulated data, so the deviations are not rep- 
resentative of the “error bars” on a particular detection, 
but instead represent an ensemble average over idealized 
Gaussian stationary noise realizations of the statistical 
error for these particular systems. 

Table I contains the intrinsic parameters — those that 
affect the shape of the waveform. Because we consider 
non-spinning black holes, the only intrinsic parameters 
are the masses. The extrinsic , or observer-dependent, 
parameters (i.e., distance and sky-location) are given in 
Table II. They are encoded in the instrument response 
to the GW, instead of being imprinted in the phase and 
amplitude evolution of the waveform itself. We do not 
report on the orientation parameters a and ib or reference 
time f p and phase (p p parameters in this fashion, but 
note that results for these other parameters are consistent 
with the extrinsic variables in Table II. 
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q 

M (Me) flow (Hz) SNR 

ain u 

p(EOB|EOB) 

Oin M Sftln M 

Spin M 

ain M 

p(EOB 22 |NR) 

d\nM SP\nU Spin M 

0"Ln M 

p(EOBhh|NR) 
G’ln.M fif3\n M 

M 

1 

50 

BTiM 

12 

0.02 


0.11 

0.05 

0.02 

0.02 

0.34 

0.17 

■*.71 

0.02 


0.10 

1 

50 

30 

48 

4 x 10 -3 

3 x 10 “ 3 

1 x 10 -3 

0.14 

0.01 

0.01 

1.76 

0.84 

2 x 10“ 

3 2 x 10" 3 

0.79 

0.87 

2" 

23 

60 

12 

0.02 

0.01 

0.01 

0.02 

0.03 

0.02. 

0.27 

0.07 

0.03 

0.02 

0.18 

0.26 

2 

51 

30 

12 

0.03 

0.02 

3 x 10" 3 

4 x 10“ 3 

0.03 

0.02 

0.47 

0.28 

0.03 

0.02 

0.01 

0.01 

2 

51 

30 

48 

0.01 

0.01 

0.01 

0.01 

0.01 

0.01 

1.92 

1.39 

0.01 

0.01 

0.93 

0.32 

6 

56 

30 

12 

0.03 

0.03 

0.03 

0.08 

0.03 

0.03 

0.94 

0.66 

0.02 

0.02 

0.58 

0.39 

6 

56 

30 

48 

0.01 

5 x 10“ 3 

0.23 

0.04 

0.01 

0.01 

3.47 

2.51 

0.01 

4 x 10 -3 

1.43 

0.84 

6* 

120 

10 

12 

0.03 

0.03 

0.05 

0.14 

0.03 

0.03 

1.67 

0.60 

0.02 

0.02 

0.29 

0.06 


TABLE I: Fractional systematic biases <5/3 (see Eq. (4)) and statistical errors a for intrinsic parameters as determined by the 
MCJ.IC sampler. An asterisk in the mass ratio column indicates examples where EOBhh was used for the p(EOB|EOB) study. 
All other examples used the EOB 22 waveform. 


FYom Table I we can see that generically, the EOB 22 
waveforms are not as accurate as the EOBhh waveforms, 
which include the subdominant modes. This is true even 
for comparable-mass systems, where the subdominant 
modes only minimally contribute to the overall waveform 
power. The bias introduced by neglecting additional har- 
monics is not due to missing waveform power as much as 
it is caused by phase differences between a quadrupole- 
only template and the full NR data, as coherent matched 
filtering analysis me typically more sensitive to phase 
than amplitude. 

The parameter estimation accuracy of the EOBhh 
model up to SNR ~ 50 exceeds expectations from 
Ref. [25], as can be seen by focusing on rows 2 and 7 
in Table I. Here we find systems chosen specifically to 
compare with Fig. 15 in Ref. [25] where, based on the 
accuracy requirement proposed in Refs. [40, 41], they 
predicted that systematic error could exceed statistical 
error at single-detector SNR ~ 35 (q = 1, M = 50 Mq) 
and ~ 11 (q = 6, M = 56Mq). 3 

Our analysis uses the LIGO/ Virgo network of de- 
tectors, as opposed to the single-detector studies from 
Ref. [25]. This difference will not heavily impact the 
results, as it is the measurement of intrinsic parameters 
that is most affected by differences in the waveform model 
due to both the accuracy with which they are measured, 
and the way they are encoded in the phase evolution of 
the signal. Measurement of the intrinsic parameters is 
not greatly influenced by the inclusion of additional de- 
tectors in the network (at a fixed SNR). Our findings 
show that, even at SNRs that are rather high for an ex- 
pected LIGO detection, the EOBhh model introduces 
systematic errors that differ by < la from the injected 
parameters. 

The extrinsic parameters, on the other hand, are in- 


3 The accuracy criterion used in Refs. [25, 40, 41] is a 'sufficient' - 
but “not necessary” requirement for parameter estimation, and 
it does not say which of the binary parameters will be biased and 
how large the bias will be. Thus, the authors of Ref. [25] were 
making conservative judgments about the waveform accuracy. 


ferred mostly from the overall amplitude of the wave- 
form, which is not as well measured as phase, and the 
time-of-arrival of the signal at each detector. We thus 
expect that the extrinsic parameters, determined with 
lower fidelity than the masses, will be better able to toler- 
ate small differences between template waveform models 
within the statistical error. Adding additional detectors 
to the network dramatically improves the statistical er- 
ror for extrinsic parameters, mostly due to the increased 
baseline [68], but not to the point of becoming influenced 
by the waveform systematics. 

Indeed, we find that the relative systematic biases for 
extrinsic parameters are generally smaller that those of 
the intrinsic parameters.For systems with q> 2, regard- 
less of the SNR or the EOB model, the systematic errors 
are consistently smaller than the statistical errors, even 
when the (2,2)-only waveform is used as the template. 
This is evident in Fig. 2, where the Dl, sin <5, and a poste- 
riors are nearly indistinguishable, despite the significant 
difference in the residual left behind by the waveform 
model, as shown in the bottom-right panel containing 
the logarithmic likelihood distributions. The same can 
not be said for the equal-mass cases (top two rows in 
Table II), where we encounter a subtle effect from our 
choice of statistic, <5/3. 

For the SNR = 12, equal-mass case, the 8,6 statistic 
breaks down and results are omitted from the table. At 
such a low signal strength, the orientation parameters 
axe very poorly measured, with the polarization angle ip 
effectively unconstrained. These large measurement un- 
certainties cascade through the ID posteriors via strong 
il> — 1 and 1 — Dl covariances. We are left with an un- 
constrained D[, distribution that is poorly characterized 
b> the variance, and large stochastic variation from one 
Markov chain run to the next as to where the MAP 
parameters lie. This degeneracy is evident in Fig. 3, 
where we show the 2D marginalized posterior distribu- 
tion function of the 1/1 — cos 1 , plane (left panel) from a 
p(EOBhh|NR) run, and the maximum logarithmic like- 
lihood found in the Markov chain for different bins in 
Dl space (right panel) from both p(EOB 2 2 |EOB 2 2 ) and 
P(EOBhh|NR). We see a virtually flat distribution of the 
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q 

M (M g ) / lo . 

. (Hz) SNR 

Osin & a* (rad) 

p(EOB|EOB) 
trin D l 5 /Sir- Dl. 5 Ain 5 6/3 a 

p(e6b 22 |nr) 

(TItiDl <5$lnDx, Sfi&inS 

<5 /So 

p(EOB HH |NR) 

CTln »£, (SfiluDt <5 Ain 6 

sp« 

l 

50 

30 

12 

0.06 

0.04 


— 


— 

— 

— 


— 


— 

— 

— 

l 

50 

30 

48 

0.03 

0.01 


0.11 



0.12 

0.12 

0.17 

0.03 

0.20 

0.64 

0.07 

0.30 

2* 

23 

60 

12 

0.06 

0.03 

0.24 


0.04 


0.27 

0.28 

0.02 

0.06 

0.25 

0.30 

0.05 

0.08 

2 

51 

30 

12 

0.06 

0.04 

0.28 

0.03 



0.29 

0.21 

0.28 

0.07 

0.27 

0.60 

0.15 

0.33 

2 

51 

30 

48 

0.03 

0.02 

0.12 


Irojfj 


0.14 

0.11 

0.19 

0.11 

0.11 

0.07 

0.04 

0.05 

6 

56 

30 

12 

0.07 

0.05 

0.29 


■jrcl 

0.21 

0.29 

0.28 

0.14 

0.09 

0.25 

0.40 

0.41 

0.39 

6 

56 

30 

48 

0.03 

0.02 

0.14 



lima 

0.19 

0.28 

0.14 

0.09 

0.09 

0.78 

0.27 

0.21 

6* 

120 

10 


0.08 

0.05 

0.21 

■OEM 


0.13 


0.86 

0.37 

0.54 

0.22 

0.36 

0.02 

0.16 


TABLE II: Same as Table I, except here we show a subset of the extrinsic parameters corresponding to the binary’s location. 
Because of their similarity between each run, the statistical errors sue displayed once but apply to each example. Results for 
the g=l, SNR = 12 example are omitted due to the failure of our <5/3 statistic. 


maximum logarithmic-likelihood values between ~ 0.5 to 
~ 2.25 Gpc, with well over half of the allowed parameter 
space in the y — cos /, plane receiving significant posterior 
support. The injected value of Dl was near 1 Gpc. 

The over-density at {i/», cost} ~ {tt/ 3, 0.5} corresponds 
to the injected parameter values, with the 7r/2 — shift 
degenerate mode still appearing despite the inclusion of 
subdominant modes. Recall that this is an equal-mass 
system, where the subdominant modes are the least no- 
ticeable. The over-density at cost ~ 1 is due to the 
Markov chain preferring template waveforms with min- 
imal contribution from subdominant modes (to match 
the strictly equal-mass injection) as the sampler explores 
higher mass ratios, up to q ~ 3 in this case. Systems with 
cos t = k ■ L = 1 are face-on and it is this configuration 
where the subdominant modes are least prominent. 


C. Dependence of the results on the inclination 
angle 

Due to the high computational cost of each MCMC 
run, we are not able to Monte Carlo over a large popula- 
tion of binary systems. We instead have chosen extrinsic 
parameters away from the extremes of parameter space. 
This means sky-locations that are away from nulls in any 
detector’s response, and inclinations (i ~ 7r/3) that were 
not edge- or face-on with respect to the observer’s line of 
sight. 

One of the more interesting results from this study is 
the impact of the subdominant modes on the parameter- 
estimation capabilities of ground-based detectors. The 
role that the additional modes play in the waveform de- 
pends heavily on both the mass ratio and the orientation 
of the binary — edge-on systems have the largest contri- 
bution from the additional modes, while face-on systems 
are most dominated by the (2,2) mode. It is therefore 
possible that, for some more extreme orientations, sys- 
tematic biases could become large due to the increased 
importance of the additional modes. 

To allay this concern we performed a series of MCMC 
runs on a system where the subdominant modes would 


play an important role, exploring the edges of orientation 
space for each run. We chose the M = 120M©, q — 6 
system (row 8 in Tables I and II) and analyzed three dif- 
ferent orientations: edge-on (c — 7t/2), face-on (/, = 0), 
and moderate tilt (i = tt/ 3). We compare the Ain M 
and Ain Ad posteriors for each of these systems using 
the EOBhh model as a template to study data contain- 
ing an NR waveform injected at SNR= 12. The results in 


i 

^ln M 

SfilnM 

^ln .M 

<5y3ln M 

|%MR %HH 

0 

0.03 

0.10 

0.04 

0.12 

55 


7t/3 

0.02 

0.29 

0.03 

0.06 

59 

7.5 

■k/2 

0.02 

0.18 . 

0.03 

0.01 

60 

10 


TABLE III: Fractional systematic errors and statistical errors 
for InM and In At when M = 120A/©, q = 6, SNR = 12 and 
for three different inclinations: Edge-on (/, = x/2), face-on 
(t = 0), and an intermediate orientation (i = n/3). We also 
include the percentage that the merger (MR) and additional 
modes (HH) contribute to the total SNR. 


Table III show the fractional systematic error well below 
unity for each orientation regardless of the inclination 
angle. We ‘also include the percentage of the total SNR 
that comes from the merger and ringdown of the wave- 
form (MR) and the additional modes (HH). This result 
confirms that the parameter-estimation accuracy of the 
EOB model is robust to different orientations, and thus 
different strengths of the additional modes. 


D. Simulating a detection 

All of the above results have been performed on simu- 
lated data that do not contain any noise, but do include 
the noise PSD in the inner product defined in Eq. (3). 
Thus the posteriors that we generate are not represen- 
tative of a probability density function for an actual 
GW measurement, but instead are the hypothetical aver- 
aged measurements of the same system in an ensemble of 
noise realizations [69-71]. To more realistically demon- 
strate the parameter-estimation capabilities of advanced 
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p(EOB 2z |EOB22) p(EOB hh |NR) 



D u (Gpc) 


PIG. 3: Selected results from the q = 1, SNR — 12 run to exhibit the breakdown of S3 as a useful statistic. The left-hand 
panel shows the 2D marginalized posterior for the orientation angles ip and cos l, with darker colors corresponding to higher 
probability density. The right-hand panel displays the maximum logarithmic likelihood as a function of Dl, which is effectively 
uniform between ~ 0.5 and ~ 2.25 Gpc. 


ground-based interferometers, we want now to simulate 
a single LIGO/ Virgo detection. 

To that end, we use again the binary configuration 
with q = 2, M = 51M© and network SNR of 12, but now 
add stationary, gaussian noise to the NR waveform using 
the same PSD as in the noise-free study. The resultant 
posteriors are then representative parameter-estimation 
products, subject to the following important caveats: 

• We use the same PSD for each detector when, 
in practice, each interferometer will have different 
sensitivity at any given time. Furthermore, the 
Virgo design sensitivity is not identical to LIGO 
(although it is qualitatively similar). We also effec- 
tively introduce a noise “wall” at 30 Hz to account 
for the limited duration of the NR data. 

• We do not include any calibration errors in the 
waveform injections, which could prove to be a 
significant contribution to the overall parameter- 
estimation error budget [72]. Furthermore, we do 
not account for intrinsic error in the NR waveforms. 

• We recognize that simulated additive gaussian 
noise is different from injecting waveforms into real 
LIGO/Virgo noise [56]. 

For this study we find the 2D marginalized posterior 
distribution functions to be of the most interest. We show 
results for the sky-location in Fig. 4 and mass parameters 
in Figs. 5 and 6. 

In Fig. 4, the sky-location posterior is shown in a Mol- 
weide projection with the detector locations projected on 
to the celestial sphere. The white, dotted lines show the 
circles of constant time delay between each pair of de- 
tectors. The posterior should sit at intersections, and 
the principal axis should lie along a line. A small white 
square is included, centered on the injected position. The 


injected values for the sky location are contained within 
the ~ 63% confidence interval of the posterior (the red 
region of the error ellipse). The injected sky location was 
chosen to be a region where the SNR in each detector was 
roughly equivalent. 

Of more pertinence to this study are the mass poste- 
riors. While In M and In M are the most convenient pa- 
rameters for the MCMC sampler, being the most orthog- 
onal, they are not of the most interest to the wider as- 
trophysical community. A better data product would be 
posteriors on either the individual masses mi and m 2 , or 
the total mass and mass ratio. In post-processing we take 
the MCMC chains and compute the relevant mass param- 
eters at each step in the chain. The injected component 
black holes have masses mi = 34 M© and m 2 = 17 Mq. 
We show the 2D marginalized posterior distribution func- 
tions for the the mi^m 2 (Fig. 5), and M-q plane (Fig. 6), 
where the color corresponds to the posterior density. 

These figures give a good depiction of just how cor- 
related the mass parameters are with one another, and 
how much of parameter space is supported by the chain 
in part due to that strong correlation. For example, the 
M-q plane has significant support for mass ratios be- 
tween 1 and 3, compatible with previous LIGO MCMC 
studies using PN waveforms (e.g., see Ref. [48]). These 
are the type of parameter-estimation products that the 
astrophysics community can anticipate as the advanced 
detectors come on line in the coming years. 


V. RESULTS FOR SPACE-BASED DETECTORS 

For EOB waveforms that include subdominant modes, 
we have found relatively small systematic errors in 
parameter-estimation results for ground-based observa- 
tions with SNR < 50. Because ground-based GW instru- 
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FIG. 4: 2D marginalized posterior distribution function for the sky-location of a q — 2, M = 50 M©, binary with SNR = 12 
injected into simulated stationary gaussian noise colored by the Advanced LIGO noise power spectral density. The red, yellow, 
and cyan regions roughly correspond to 1, 2, and 3<x confidence regions. The white box represents the injected sky location 
of this source. The position of each interferometer in the network are projected onto the sky, and depicted with H,L, and V 
for LIGO Hanford, LIGO Livingston, and Virgo. The white, dashed fines show the locations that yield the same time-delay 
between each pair of detectors for the injected sky position. 


P( h EOB ih Nfi) 



P( h EOB lh NR) 



48 49 50 51 52 53 54 55 56 
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FIG. 5; 2D marginalized posterior distribution function for 
the individual masses mi and m 2 of a q = 2, M = 51A/©, bi- 
nary with SNR = 12, injected into simulated stationary gaus- 
sian nc se colored by the Advanced LIGO noise power spectral 
density, The injected values for {mi, m 2 } were {34, 17} A/© • 


ment rates are limited by sensitivity, higher-SNR events 
are exceedingly unlikely in the first generation of detec- 
tions. Proposed space-based instruments will be sensi- 
tive to supermassive black-hole (SMBH) mergers out to 
cosmological scales, such that a significant fraction of de- 
tected events may have SNR > 100. Space-based in- 
struments are typically sensitive to these events over a 
broad bandwidth covering a large number of cycles lead- 
ing up to merger [4, 73]. Such observations will make 


FIG. 6: Same as Fig. 5, but now depicting total mass M and 
mass ratio q. 

much greater demands on the accuracy and efficiency of 
inspiral-merger-ringdown waveform templates. Though 
considerable effort has gone into estimating the ability 
of space-based instruments to measure astrophysical pa- 
rameters assuming accurate waveforms, very little has 
been done to assess the template requirements for these 
future observations. Here we make a limited exploration 
of the capability with the current numerical-relativity 
and EOB waveforms to pursue space-based observations. 

Of several proposed space-based GW interferometer 
instruments [3, 4, 74 , 75], the best studied concept is 
the classic LISA mission [3] . While acknowledging, that 


density 
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there is currently considerable uncertainty about when 
and how the first space-based GW instrument will be de- 
veloped, we choose to study the classic LISA configura- 
tion to make contact with the large body of work that has 
already been dedicated to black-hole merger parameter 
estimation (e.g., see Refs. [30, 31, 76-78]). To compare 
with other concepts, the most relevant alteration from 
the classic LISA design is the arm-length (e.g., from 5 
Gm for classic LISA down to 1 Gm for eLISA), which 
sets the overall scale for parameter-estimation capabili- 
ties. Our results for a more modest detector configura- 
tion would be very similar to those for classic LISA, after 
appropriately rescaling the total mass of the black-hole 
system. 

We follow here the same procedure outlined for the 
LIGO/Virgo studies in Sec. IV, where NR waveforms are 
injected into simulated noise-free data, and the signals 
are analyzed using the EOB model as a template. Be- 
cause we saw significant bias in the EOB22 model at SNR 

50 it is safe to assume that those errors will only grow 
with SNR, and so we focus these runs only on the EOBhh 
model. 

The duration of the available NR data restricts us to 
brief LISA observations, for which we can apply the static 
limit for the detector; thus we neglect LISA’s orbital mo- 
tion during the observation time. Consistently, we focus 
on systems of mass 3 x 10 7 Mq at the high end of LISA’s 
sensitive range. For such observations the maximum fre- 
quency attained by the merger signal is w'ell below the 
transfer frequency of the detector (when the wavelength 
of the GW signal is comparable to the size of the detec- 
tor). In this low-frequency, static regime, the instrument 
response is equivalent to two 60° Michelson interferome- 
ters, co-located, and misaligned by 7r/3 radians. The an- 
tenna patterns for this configuration, and the discussion 
of the two limits applied here, can be found in Ref. [79]. 

We consider mass-ratios in the range 2 < q < 6 ob- 
served at SNR=100, which would make these unusu- 
ally distant for LISA observations. As shown in Fig. 7, 
the power-spectral density is similar over this mass-ratio 
range with more structure at q = 6. The noise-weighted 
waveforms for these cases are most comparable to the 
largest-mass I2OAL5 LIGO case that we studied. Space- 
based instruments are not expected to have a strong 
power-law slope like the seismic noise wall in the LIGO 
sensitivity curve, meaning that even for large masses 
there is a softer degradation of sensitivity going back 
to the early portions of the signal, making our / m ; n cut 
somewhat more artificial here. 

Generally the liigher SNRs of our nominal LISA obser- 
vations would predict larger bias from systematic errors 
in the template waveforms. Because of the differences be- 
tween the sensitivity curves and response for LISA and 
LIGO, however, it is not straightforward to scale up such 
expectations. Table IV shows our results for the param- 
eter biases for mass M and chirp mass M. Unlike the 
LIGO results, the biases here are already significant in 
most cases for LISA observations at SNR=100, reaching 


a few times the statistical error level. In Table IV we also 
provide the SNR of the residuals after the MAP wave- 
forms are removed from the data. These residuals with 
SNR > 6 would be detectable and could therefore lead 
to biases in estimates of overlapping signals. Residuals 
at this level would also limit the utility of the current 
waveform templates for studies aimed at testing general 
relativity [52, 53]. 

For LISA, however, such a system would have to be 
exceedingly distant, at redshift z > 20 or more, in order 
to expect an SNR as small as 100 [78]. Actual SNRs could 
be as much as 100 times larger, and we would expect 
correspondingly larger relative biases and residuals. Full 
interpretation of LISA data would thus require higher 
levels of template accuracy. Even the level of errors in 
the numerical simulations used here in place of the exact 
predictions of general relativity are far too large to avoid 
biasing such high-SNR measurements. 

That being said, while the statistical error should de- 
crease linearly with increasing SNR (causing 63 to grow), 
the systematic error should remain approximately the 
same, and the absolute biases in the mass parameters 
are small. Reading from Table IV we see that the sys- 
tematic errors in M and M are < 1%. Converting these 
into the individual masses, the biases on mi and m2 are 
at most a few percent. While the MAP mass parameters 
may end up many a away from the true values and would 
thus fall short of an optimal analysis of LISA data, such 
~ 1% errors in the masses are still small in astrophysical 
terms and may have little impact on key inferences made 
about the massive black-hole population. For example, 
when trying to constrain black-hole formation scenarios 
using simulated eLISA BBH catalogs, Amaro-Seoane et 
al [4] (based on the procedure in Ref. [51]) endured sta- 
tistical errors as high as ~ 1% and were still able to easily 
discriminate between black hole seed models. 

The limitations of our current analysis prevent us from 
providing any detailed indication of how far template ac- 
curacy must be improved for space-based observations. 
For the detector configuration used here, the extrinsic 
parameter estimation at SNR = 100 is actually worse 
than that for the ground-based detectors. Space-based 
detectors rely on the long duration of the signals, the am- 
plitude and Doppler modulations caused by the orbital 
motion, and finite-arm-length effects, to break degenera- 
cies among the system parameters. We are limited here 
to working in a regime where all of those features are 
missing from our instrument model, so our classic LISA 
response is more like a two-detector co-located LIGO net- 
work operating at significantly lower frequencies than the 
existing ground-based detectors. 

More complete analysis will require either dramatically 
longer-duration numerical simulations or a suitably well- 
controlled wav of matching analytical and NR waveforms 
at low frequency that does not add as much systematic 
error as the template models being tested. We would also 
require more computationally efficient signal generation 
to successfully study template biases with long-duration 
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FIG. 7: Same as Fig. 1 now showing the LISA examples using M 3 x 10 7 Mq at SNR — 100. The time-domain waveform 
in the left-hand panel is for the <7 = 6 case. 
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TABLE IV: Same as Table I but for SMBII mergers as seen by LISA. Here we also include the residual SNR after the WAP 
waveform is regressed from the data. The p(EOB|EOB) study used the EOBhh model for each example. 


waveforms. 


VI. CONCLUSIONS AND DISCUSSION 

In this paper, we have produced the first measurement 
of systematic errors introduced by using EOB templates 
to analyze NR waveforms. Our study’s main focus was 
on stellar-mass BBHs observed by Advanced LIGO and 
Virgo. We also considered massive black- hole mergers 
detectable by space-based interferometers like LISA. We 
have injected NR waveforms into simulated data and 
have used an MCMC sampler to characterize the pos- 
terior distribution function for the astrophysical param- 
eters. Our metric for assessing the size of the systematic 
error is to compare the offset between the injected and 
best-fit parameters to the statistical error characterized 
by the standard deviation of the ID marginalized poste- 
riors. 

Several of our examples were chosen specifically to 
compare with Fig. 15 in Ref. [25]. Encouragingly, we find 
that the EOB waveforms accurately recover binary pa- 
rameters at SNRs higher than were predicted in Ref. [25] 
using the (deliberately) conservative accuracy require- 
ments of Refs. [40, 41]. 

For the stellar-mass systems we have investigated, 
when including the subdominant modes, we find system- 


atic biases consistently comparable to, or smaller than, 
the statistical errors for mass ratios up to q = 6 and 
SNRs < 50. We have tested these waveforms in the 
most stringent way possible, simulating high-SNR events 
where the merger (the least reliable part of the waveform 
calculations) is peaking in the most sensitive band of the 
detector, and the higher- frequency modes contribute sig- 
nificantly to the overall signal power. For the q = 6 
waveforms, the fraction of power contained in the sub- 
dominant modes is 11% and 16% for the M = 56 M© and 
M = 120M© systems, respectively. We also tested low- 
mass systems (M ~ 20 Mq) to better represent likely Ad- 
vanced LIGO/Virgo detections. In all of these examples, 
the bias introduced by the EOB waveform in Ref. [25] 
was at worst comparable to the statistical errors. 

Matched-filtering analyses are most sensitive to the 
phase of the signal, and it is the phase of the waveform 
that is most influenced by different choices of model. 
It is therefore predictable that the most significant bi- 
ases appear in the mass parameters. On the other hand, 
the extrinsic parameters have comparatively less impact 
on the shape of the signal — the distance comes in as 
an overall amplitude scaling, and the sky location is 
(for ground-based interferometers) predominantly deter- 
mined through triangulation based on time delays be- 
tween detectors. Therefore a model waveform used for 
parameter estimation has much more room for error if, 
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for instance, the location of the binary is the primary 
interest (say, for optical counterpart searches) and the 
requirements on the phase-matching are not as severe. 

While the results here are undoubtedly positive, there 
is still work to be done in waveform modeling. We are 
only testing the EOB waveforms over the last 30-40 GW 
cycles before merger and there is no guarantee that longer 
waveforms will not accumulate larger phase errors during 
the early portion of the inspiral. Furthermore, this study 
leaves off significant aspects of the waveform structure re- 
lated to black-hole spins and orbital eccentricity. A sim- 
ilar study will need to be performed with long-duration, 
spinning systems once both the NR and EOB waveforms 
are prepared for that test. It will also be valuable to test 
the EOB waveforms over a broader class of NR simula- 
tions, including those that were not used to calibrate the 
template model. Moreover, we have assumed in this pa- 
per that the NR waveforms were exact but, as discussed 
in Sec. II, this is not the case. One possibility for taking 
the numerical error into account is to inject NR wave- 
forms computed at different resolution and/or extracted 
at different radii and measure the EOB systematic bi- 
ases in each case. The difference between these biases 
can provide us with an estimate of the intrinsic error 
caused by the NR waveforms deviating from the exact 
solution in general relativity. Finally, we do not consider 
the effects of real detector noise or calibration errors in 
the data, both of which could prove to be a significant 
contribution to the overall error budget [56, 72]. 

While EOB waveforms that include subdominant 
modes were found to have relatively small systematic er- 
rors ir; parameter-estimation results for ground-based ob- 
servations with SNR < 50, proposed space-based instru- 
ments are sensitive to SMBH mergers with SNR > 100. 
For these scenarios, we find statistically significant bi- 
ases in the mass parameters for mass-ratios in the range 
2 < q < 6 observed at SNR = 100, on the order of ~ 1% 


for the component masses of the system. However, as 
discussed in Sec. V, systematic errors introduced by the 
EOB templates are small enough to still place strong con- 
straints on the population of massive black holes in the 
Universe. 

In the LISA examples, the residual power exceeded 
the SNR > 6 threshold above which the data analysis 
could be compromised both in parameter estimation of 
overlapping signals and studies aimed at testing general 
relativity. Space-based GW data analysis will require 
more accurate templates grounded in even more accurate 
numerical simulations. 

A more complete analysis is needed, but will require ei- 
ther dramatically longer-duration numerical simulations 
or a suitably well-controlled way of matching analyti- 
cal and NR waveforms at low frequency. We would also 
require more computationally efficient signal generation 
to successfully study template biases with long-duration 
waveforms. 
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